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Abstract 

This  report  presents  an  example  of  repression  analysis 
wnicti  illustrates  the  nwjoi  ju<3*..*ntr1  ronaldcr-ir*.  un*  in  t-h«. 
development  of  a  cost  estimating  relationship.  The  example 
used  is  the  development  of  hardware  costs  of  turbine  aircraft 
engines.  The  methodology  discussed  is  most  useful  for  "quick 
reaction"  studies  and  has  been  used  by  Headquarters,  US  Army 
Materiel  Command  for  this  purpo.e.  Particular  points  discussed 
are.  scatter  diagrams,  net  scatter  diagrams,  causal  requirements, 
combinations  of  variables,  and  sample  selection. 
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APPLICATION  OF  DEGRESSION  ANALYSIS  TO  HARDWARE  COST  ESTIMATION 


/; 


The  purpose  of  this  technical  report  is  Co  illustrate 
the  application  of  regression  analysis  to  hardware  cost 
estimation.  The  example  given  in  this  report  uses  hypothetical 
data  which  have  been  generated  to  best  illustrate  the  analytical 
methodology  being  presented.  These  data  have  oeen  selected 

* —  .  - tons.  Fire’i  “'■ere  vccld  be  _  possible  breach 

of  proprietary  information  'f  actual  AMC  costing  equations  were 
used.  Second,  it  was  necessary  to  select  the  data  so  th*t  all 
important  considerations  could  be  illustrated  clearly.  A 
working  knowleo je  of  basic  statistics  is  assumed  in  this 
discuss 1  on. 

The  example  presented  in  this  study  is  the  estimation  of 
hard  ;a re  cost  of  turbine  aircraft  engines.  Typically,  a  request 
would  have  been  ccelvod  for  cost  Information  which  could  be 
used  for  program  n d  budget  purposes  and  foi  prediction  f 
possible  corf  ovsruns.  The  information  was  to  be  developed 
'n  the  T 3 / 5  family  of  er^inej  with  emphasis  on  the  TX,  a  follow- 
on  engine  to  be  produced  dv  the  same  contractor  which  produced 
all  other  family  members.  This  study  was  selected  because  it 
illustrates  most  of  the  cons i ders t i ons  to  be  ''de  during  a 
statistical  c^st  analysis.  Experience  gained  in  the  use 
of  scatter  diagrams  and  regression  analysis  will  be  presented. 

In  solving  *  statistical  analysis  problem  of  this  -ype, 
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the  first  and  moat  important  step  is  to  gather  available, 
historical,  analogous  data.  Tha  Army  had  procurad  several 
T3/'  anginas  Cor  which  historical  costa  vara  known.  Saand 
on  data  fro*  tha  successive  contracts  for  aach  modal,  slopes 
of  tha  respective  learning  curve#  could  also  ba  derived. 

Fro*  this  data  tha  coat  of  tha  hundredth  unit  for  each  model 
was  derived  ss  tha  comparable  costs  for  tha  various  engines. 


Figure  1 

Graphic  Representation  of  100th  Unit  Cunul*tl  -a  Average 
Coat#  and  Learning  Curves 

Figure  1  illustrates  that  the  slopes  have  became  shal lover 
In  this  case  for  the  newer  anginas.  Therefore,  a  tentative 
alopa  for  tha  new  TX  could  ba  presumed  to  ba  even  shallower. 
Whan  tha  100th  unit  coat  of  tha  TX  Is  calculated  tha  associated 
learning  curve  stay  ba  alrilar  to  tha  dashed  line  shown  for  the 


TX. 


Table  1 

Hypothetical  Turbine  Aircraft  Engine  Data  Base 


.able  1  shows  th®  collection  of  all  available  data  for  turbine 
aircraft  engines  including  performance  data,  technical  data  end 
KOth  unit  coat  data.  The  list  of  engines  for  which  data  is 
available  me*  tides  all  those  which  show  similar  clvaracterlst ics 
t '  t  h-.s®  of  the  TX.  for  which  cost  Is  to  be  predicted .  T'ne 
regression  analvris  developed  here  will  re’ate  one  or  more  of 
these  character  ist  ics  e.  the  engines  to  ir  cost  in  a  cost 
estimating  relationship  ffER). 

The  first  step  in  TER  development  is  'he  judgmental  selection 
o'  ell  systems  vllcn  ere  sir i lar  to  the  spec, tic  system  being 
studied.  The  JET- 12a  l  and  -2  are  not  turbine  engine*  but  jet 


engines ,  so  t,v*t  data  may  be  discardeo.  The  Td  and  TS  engines 


were  not  produced  by  the  seme  contractor  who  produced  ali  other 
T3 !*>  engines.  This  data  can  therefore  be  tentatively  rejected 
even  though  the  cost  and  performance  data  are  within  the  range 
of  the  T3/5  data.  Tho  four  engines  just  discussed  are  not 
considered  to  have  homogeneous  or  analogous  character istics 
since  the  costs  requested  are  those  of  one  particular  contractor. 
After  the  rejection  of  four  engines,  there  are  data  on  seven 
engines  remaining  a.  d  this  is  a  sufficiently  large  sample  to 
form  the  basis  of  a  CER,  If  there  were  data  on  only  two  or  three 
T3/5  engines,  consideration  would  have  been  given  to  using 
the  similar  T&  and  T8  engines  to  provide  a  larger  sample  size. 

A  sample  must  be  large  enough  to  provide  statistical  confidence 
in  the  resulting  CER. 

The  next  step  is  to  examine  available  performance  and 
technical  characteristics  to  assure  that  the  variables  may 
actually  be  used  to  predict  costs.  In  this  case,  installed 
weight  must  be  rejected  even  though  there  appears  to  be  a 
relationship  between  cost  and  this  characterisr Ic.  Rejection 
is  necessary  because  all  new  engines  are  being  Installed  in 
lighter,  more  expensive  mounts  so  that  costs  would  still 
continue  to  rise  as  installed  weight  drops.  We  therefore 
reject,  installed  weight  as  a  future  predictor.  The  type  of 
mount  is  independent  of  the  engine  used.  An  old  engine  could 
also  use  a  new  mount. 
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Figure  2 

Turbine  Aircraft  Engine  Installed  Weight  vs.  Cost 
Figure  2  which  shows  installed  weight  versus  cost,  is 
shown  only  to  illustrate  the  point  that  data  which  appears 
excellent  may  have  to  be  rejected  on  closer  evaluation. 

Also,  notice  that  the  T4  and  T8  engines  which  are  prod  iced 
by  another  contractor  have  no  apparent  relationship  to  the 
T3/5  group.  After  the  deletion  of  installed  weight,  the 
remaining  independent  variables  are  revolutions  per  minute, 
military  shaft  horsepower,  specific  fuel  consumption  and 
number  of  compressor  stages. 

One  must  be  very  careful  to  ins'”-?  that  there  is  a 
cause  and  effect  relationship  between  the  independent 
variables  and  cost.  As  a  recent  Cost-Effectiveness  Newsletter* 
stated,  "There  is  probably  a  good  correlation  between  men's 
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*The  CE  Newsletter,  Volume  3,  Number  1,  February  1968,  page  3. 


shoe  sises  *nd  their  heights.  Therefore,  one  likely  way  to 
reduce  a  man'.i  height  is  to  chop  off  his  toes."  The  moral 
is;  make  sure  tne  long  feet  actually  cause  height  before  you 
cut  off  the  man's:  toes. 

Figure  3  shows  shaft  horsepower  plotted  against  cost.  An 
analyst  familiar  with  mathematical  functions  may  observe  thr 
possibility  that  a  sq tare  root  function  may  "fit"  this  data.* 
The  data  for  this  varlibie  can  easily  be  transformed  into  a 
square  root  function  and  plotted  again  to  check  this  assumption. 


Figure  3 

Shaft  Horsepower  vs.  Cost 


The  data  is  transformed  into  a  square  root  function  in 
Figure  4.  A  general  square  root  function  seems  to  "fit"  the 
transformed  data  fairly  well.  It  will  be  useful  to  consider 
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♦See  Appendix  for  explanation  of  use  of  nonlinear  terns. 
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the  square  root  of  shaft  horsepower  as  an  independent  variable, 
a  variable  which  logically  may  cause  cost  and  therefore  be  a 
good  predictor  of  cost. 


Figure  4 

Square  Root  of  Shaft  Horsepower  vs..  Cost 
Figure  5  shovs  revolutions  per  minute  timer  number  of 
stages  plotted  against  cost.  This  combination  was  chosen 
because  it  represents  an  engineering  relationship.  Combinations 
based  or.  engineering  relationships  often  make  good  CERs 
bacause  they  are  often  the  rt lationships  which  actually 
cause  cost.  The  observations  on  this  graph  are  very  scattered 
and  are  not  considered  a  good  possibility  tor  mathematical 
expression,  so  this  relationship  was  rejected. 

After  all  likely  variables,  transformations  of  variables 
and  combinations  of  variables  nave  been  chosen,  a  least 
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squares  1 in*  or  multiple  linear  regression  Is  computed  using 
stand/ rd  statistical  regression  techniques.  These  techniques 
are  well  known,  are  covered  in  standard  textbooks  such  as 
those  referenced  at  the  end  of  this  report,  and  will  not  be 
gi /en  here. 


Figure  5 

Re.olutions  per  Minute  vs.  Cost 
What  has  been  done  so  f  *  is  not  difficult  or  time  consuming. 
However,  to  adequately  consider  ail  possible  relationships  and 
derive  the  best  available  C^R  requires  many  scatter  diagrams 
and  calculations  of  multiple  variable  regression  lines.  If 
multiple  variables  are  used,  simple  one  variable  scatter 
diagrams  of  the  type  illustrated  contain  the  effects  of  more 
than  one  variable  and  become  useless  to  the  analyst  using 
visual  inspection  methods. 
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With  the  assistance  of  a  small  computer  these  problems 
e«n  be  solved  quickly.  The  rush  requirement  is  especially 
typical  of  the  Army  environment  where  all  cost  analysis  is 
due  yesterday.  Since  a  computer  of  any  size  is  usually  a 
’  isiited  resource  and  access  time  is  slow,  the  cost  analyst 
must  make  the  best  use  of  all  available  techniques  to  expedite 
finding  the  one  or  moi^  variables  that  best  explain  cost. 

The  procedure  to  be  described  requires  a  minimum  of  twv, 
multiple  regressions.  If  more  computation  effort  can  be 
afforded  so  much  the  better.  Planning  the  procedure  to  be 
used  will  afford  good,  timely  results. 

The  first  step  in  this  procedure  is  to  compute  a  Inear 
multiple  regression  using  as  many  variables  or  combinations  of 
variables  as  good  judgment  and  the  computer  will  allow. 

Hopefully,  the  program  will  discriminate  and  select  only 
significant  variables.  Significant  variables  are  those  which 
in  some  way  relate  to  or  explain  changes  in  cost. 

After  the  linear  regression  line  has  been  computed  it 
is  used  to  recompute  the  costs  of  all  of  the  engines  in  the 
sample.  These  new  costs  are  called  "computed  costs"  as  oppsed 
to  the  observed  or  actual  costs.  The  difference  between  computed 
and  observed  costs  for  each  observation  is  called  the  residual. 
These  residual  costs  with  the  proposed  regression  line  will 
be  used  to  help  Identify  as  yet  undiscovered  but  meaningful 
relationships. 


9 


,20 


Th«  scatter  diagrams  shown  in  earlier  figures  showed  the 
variation  or  residual  about  a  least  squares  line  computed  with 
only  one  independent  variable.  This  variation  contained  no 
effects  fro®  other  variables  but  did  include  an  error  term. 

The  multiple  regression  line  now  contains  the  influence  of  one 
or  more  significant  variables  relating  to  cost.  These  residuals 
are  determined  by  all  the  Independent  variables  included  in 
the  line  plus  an  error  term. 

It  is  possible  to  isolate  and  look  at  one  variable  at  a 
time  on  a  "net"  scatter  diagram  by  plotting  the  regression 
line  at  the  mean  of  all  but  one  of  the  .independent  variables 
and  diagramming  the  one  isolated  variable  versus  cost. 


The  two- variable  net  scatter  diagram  illustrated  in  Figure 
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6  for  shaft  horsepower  and  cost  n«ts  out,  or  eliminates,  variation 
caused  by  other  independent  variables  so  that  the  true  relationship 
between  cost  and  shaft  horsepower  can  be  studied  in  a  manner 
similar  to  the  previous  single- variable  scatter  diagrams. 

Here  the  regression  equation  (or  tentative  CER)  is  plotted 
with  ^11  other  independent  variables  valued  at  their  mean  so 
the  scatter  of  residuals  about  the  line  can  be  studied  for  some 
clue  as  to  the  true  function  of  only  shaft  horsepower  and 
cost.  In  Figure  6,  an  overlayed  general  square  root  function  (line) 
illustrates  a  good  "fit"  to  the  data.  Therefore,  shaft  horsepower 
should  be  considered  as  a  good  predictor  of  cost  when  transformed 
into  s  square  root  function. 

Figure  7  shows  the  net  scatter  diagram  for  revolutions  per 
minute.  This  chart  indicates  that  the  linear  (untransformed) 
function  of  revolut  ions  per  minute  should  also  be  considered 
as  a  possible  variable. 


Figure  7 

Net  Scatter  Diagram  of  Revolutions  Per  Minute  vs.  Cost 
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The  net  scatter  diagrams  can  be  used  to  determine  what, 


f 
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if  any,  new  functions  of  variables  should  ba  tried  in  the 
multiple  regression.  Otherwise,  net  effects  of  a  variable 
can  be  buried  by  the  interactions  of  other  variables  and  a  good 
relationship  will  be  ignored.  In  this  case  th*.  two  best 
variables  were  RTM  and  the  square  root  of  shaft  horsepower. 

Of  course,  more  than  two  independent  variables  may  be  used  as 
necessary.  After  assuming  these  two  variables  as  "best" 
there  arise  .wo  statistical  problems  associated  with  the  use 
of  these  two  particular  variables.  First,  as  the  number  of 
variables  used  in  an  equation  increases,  the  statistical 
confidence  in  the  equation  decreases.  This  means  that  an 
equation  with  two  variables  is  not  likely  to  be  as  statistically 
significant  as  an  equation  with  one  variable.  Second,  there 
Is  a  high  degree  of  correlation  between  these  two  particular 
Independent  variables.  This  means,  for  example,  that  RPi 
may  "cause"  shaft  horsepower  as  well  as  cost.  The  result  is 
an  equation  with  two  unknowns.  The  only  significant  correlation 
allowable  for  an  independent  variable  is  with  cost.  These 
statistical  problems  usually  would  cause  further  search  for  more 
independent  variables.  In  this  example  there  is  another 
solution.  An  alternative  source  of  causltive  variables  is 
an  engineering  combination  of  the  two  variables.  The  statistical 


problems  above  do  not  occur  if  two  variables  ate  combined  into 


one  variable.  The  combination  is  simply  treated  as  one  variable 


And  engineering  relationships  may.  indeed,  be  true  cost-causing 
variables. 


Figure  8  illustrates  the  data  tor  the  engineering 
combination  (multiplication  oi  KPM  and  square  root  of  shaft 
horsepower!,  to  lorn,  one  new  independent  variable 

The  regression  line  in  figure  d  was  developed  from  this 
combined  data  It  is  obviouslv  a  verv  g  >od  fit  to  the  data. 
The  equation  Is  statistically  significant  using  statistical 
measures  such  as  the  f  test.  Also,  .our  requirements  are 
satisfied.  That  is,  the  CLK  is  ;.n;-  uSk-.' ,  c.  insistent, 
efficient  and  sufficient.  . he  sta.sduru  deviation  o:  the 

o;  the 


est  itnate  is  less  than  ten  percent  of  the  mean  value 


computes  to  al  at  $115,000,  a  figure  slightly  outside  the 


range  of  the  known  data.  In  this  case  extrapolation  beyond 
the  ran""  of  the  historical  data  is  acceptable  since  the  TX 
is  only  a  snal,  amount  larger  in  all  characteristics  than  the 
lai.s«=ut  known  sample  and  because  the  CE?  fits  ihe  data  so  well. 
That  is  to  say,  the  statistical  variation  is  ver  small  st-  the 
prediction  interval  is  also  snail,  e'en  v  ion  extrapolated  a 
sma 1 1  amount . 

This  technique  is  used  by  the  Cost  \na .ysis  branch  to 
prepare  credible  cost  est  in .  oicho’  :  expend1:;.;  extensive 
resources  In  the  process.  I»>  review,  •  ,e  pr  cess  involve., 
the  following  major  activities: 

1.  Collect  and  ana !  vie  the  rele  a  data  on  onl  v 
those  sy  terns  for  which  data  ar?  anal  .  .s 

2.  iivpothesite  relationships  afreet;;;,.;  os'  and  plot 
s  arter  d  lagt  .ms  . 

.3.  Test  promising  variables  using  multiple  variable 
regression  analysis  and  plot  ne’  sea*  ter  diagrams. 

•  .  Kind  the  best  tela'  ’  -nsi'  :ps  and  cat;  put  e  a  CEK  and 
test  for  statistical  signi.'  icar.ce 

5.  Insure  that  the  CTK  is  logical,  reasoahle  and 


useful  before  publishing. 
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APPENDIX 


The  title  "Linear  Regression”  does  net  mean  that  non¬ 
linear  relationships  cannot  be  considered.  Nonlinear  functions 
such  as  square  roots,  log  functions,  etc.,  can  be  transformed 
for  use  in  a  linear  equation  cy  simple  methods  such  as  those 
illustrated  in  AMC  Pamphlet  706-110  (reference  5  to  this 
report ) . 

In  general,  the  procedure  is  this: 
if  Y  =  G  +  fc  [f(x>] 

set  f (x)  ~  xf  and  solve  for  the  regression  eouatior. 
in  the  normal  manner.  The  "assumption  of  linearity”  is  basic 
to  linear  regression.  An  "error  in  specification"  results  if 
this  assumption  is  not  valid.  In  a  "quick-reaction"  context, 
the  only  test  for  error  in  specification  is  the  use  of  net 
scatter  diagrams. 
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